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ABSTRACT 

An  auxiliary  service  unit  is  normally  idle,  or  in  cold 
standby.   If  a  demand  for  the  unit's  service  occurs,  the  unit 
must  be  available  to  satisfy  it,  or  else  "catastrophe"  occurs. 
Policies  for  periodic  inspection  and  maintenance  of  such  a  unit 
are  derived  in  this  paper  that  maximize  the  expected  time  until  a 
catastrophe  occurs.   The  policies  recognize  that  inspection, 
maintenance,  and  repair  periods  are  of  non-zero  duration,  during 
which  the  unit  is  vulnerable.   They  also  account  for  the  possi- 
bility of  hazardous  inspection  that  may  damage  the  unit,  and 
various  forms  of  imperfect  repair. 

Important  examples  occur  in  the  nuclear  power  industry:   a 
unit  may  be  a  pump,  or  emergency  diesel  generator,  and  a  demand 
may  be  caused  by  an  initiating  event  such  as  pipe  break  or  loss 
of  off  site  power;  "catastrophe"  equates  to  loss-of  coolant  acci- 
dent or  melt  down.   Other  examples  occur  in  the  military,  and  in 
emergency  services  to  hospitals. 


Key  words:   Reliability,  availability,  maintenance,  time  to 

failure,  inspection,  Markov  decision  process,  nuclear 
safety,  standby  redundancy. 


1.   INTRODUCTION 

It  is  common  practice  to  improve  the  reliability  of  a  system 
by  installing  cold  standby  units,  which  are  only  brought  into 
operation  when  a  standard  operating  system  fails.   In  particu- 
lar, diesel  generators  in  cold  standby  may  be  used  to  scram  a 
reactor  in  case  of  a  coolant  pipe  breaking  or  some  other  failure 
in  a  nuclear  power  plant.   Other  examples  occur  in  hospital 
power  supplies  and  military  hardware.   If  such  a  standby  system 
fails  to  operate  when  it  is  required,  then  the  consequences  could 
be  catastrophic.   The  times  when  there  is  a  need  for  the  standby 
unit  are  called  initiating  events.   If  the  standby  system  is  in 
a  failed  state,  when  an  initiating  event  occurs,  then  a  catas- 
trophic event  is  said  to  occur. 

It  is  necessary  to  inspect  and  maintain  the  standby  system 
from  time  to  time.   If  inspection  reveals  it  to  be  in  an  unsatis- 
factory state,  repairs  are  made.   The  idea  is  that  the  standby 
unit  can  go  down  even  when  it  is  not  operating  and  this  will 
cause  it  to  fail  to  operate  the  next  time  it  is  needed. 

The  following  policy  has  been  proposed  for  the  inspection  of 
diesel  generators  in  a  reactor.   After  a  generator  is  found  to 
be  down  on  inspection  and  is  repaired,  it  undergoes  K  inspections 
at  short  intervals  of  time.   If  it  is  found  to  be  up  at  each  of 
these  short  inspections,  then  it  is  inspected  at  long  intervals 
thereafter  until  it  is  found  to  be  down.   Whenever  a  generator 
is  found  to  be  down  and  is  repaired,  inspections  start  with  the 
K  short  inspection  intervals  again.   This  type  of  inspection 
policy  reflects  the  idea  that  after  the  system  is  repaired  it 


should  be  inspected  more  often  for  awhile  to  ensure  it  was  re- 
paired correctly.   In  Section  2  we  present  a  model  for  this 
inspection  policy  and  derive  an  expression  for  the  expected  time 
to  a  catastrophic  event. 

In  Sections  3  through  5  we  will  use  various  Markov  decision 
and  renewal  theoretic  formulations  of  the  problem  to  investigate 
the  forms  of  the  optimal  inspection  policies  which  maximize  the 
expected  time  until  a  catastrophic  event  occurs.   This  will  show 
us  how  certain  assumptions  about  inspection  and  repair  of  the 
standby  system  affect  the  form  of  the  inspection  policy. 

Almost  all  the  previous  work  on  inspecting  a  single  standby 
unit  uses  a  cost  criterion.   Barlow  and  Proschan  [2]  described 
the  basic  average  cost  per  unit  time  model  with  accurate  instan- 
taneous inspection  and  faultless  repair,  while  Luss  and  Kander 
[9]  allowed  for  non-zero  inspection  times.   Wattanapanom  and 
Shaw  [20]  studied  the  problem  when  inspection  is  hazardous,  so 
that  it  is  possible  for  the  inspection  to  cause  the  unit  to  fail. 
Nakagawa  [11]  looked  at  the  probability  that  at  an  initiating 
event  the  standby  system  will  work,  while  Butler  [3J  maximized 
the  expected  lifetime  of  the  standby  unit,  but  did  not  allow  re- 
pairs.  His  model  allowed  the  standby  unit  to  be  in  more  than  one 
'up'  state,  which  are  distinguishable  only  upon  inspection.   This 
connects  with  the  work  on  partially  observable  Markov  decision 
processes  [1,10,16],  and  in  particular  the  problem  of  optimal 
inspection  and  repair  of  a  deteriorating  process  with  imperfect 
information  introduced  by  Ross  [13]  and  generalized  by  White  [21] , 
Rosenfield  [12] /  Luss  [8] ,  Sengupta  [15] ,  Suzuki  [17] ,  and  Wong 


[19].   In  these  papers,  a  system  can  be  in  more  than  one  state, 
but  which  one  is  known  only  imperfectly  or  upon  inspection. 

Our  models  of  the  inspection  and  repair  of  the  standby  sys- 
tem allow  for  non-zero  inspection-maintenance  times  and  non-zero 
repair  periods,  but  we  ignore  the  time  the  unit  is  in  use.   The 
idea  is  that  during  inspection-maintenance  and  repair  the  unit 
can  not  react  to  an  initiating  event  and  so  these  are  critical 
times  for  the  system,  whereas  we  make  the  assumption  that  the 
time  the  standby  system  is  actually  in  use  is  so  small  it  can  be 
neglected.   We  also  allow  for  imperfect  repair  and  hazardous 
inspection,  so  that  even  if  the  unit  is  up  on  inspection,  it 
might  be  down  immediately  after.   Thus  we  explicitly  represent 
possible  mistakes  in  inspection,  and  allow  for  incorrectly  iden- 
tifying the  unit  as  working  when  in  fact  it  was  down.   Another 
model  considered  allows  the  unit  to  be  in  one  of  two  'up'  states, 
which  are  indistinguishable  on  inspection,  but  have  different 
failure  rates.   This  is  intended  to  incorporate  the  idea  that  a 
repair  might  put  right  the  superficial  cause  of  the  unit's  failure, 
but  not  deal  with  the  underlying  problem,  which  will  recur. 

In  Section  3,  we  introduce  our  basic  discrete  time  models 
where  the  unit  can  only  be  either  'up'  or  'down'.   The  times 
between  initiating  events  are  assumed  to  have  a  geometric  distri- 
bution.  We  describe  the  case  where  successfully  dealt  with 
initiating  events  are  recorded  as  showing  the  unit  was  working 
at  that  time.   By  modelling  this  as  a  Markov  decision  process 
we  can  find  the  form  of  the  optimal  inspection  policy  to  maximize 
expected  time  to  a  catastrophic  event.   We  compare  this  with  the 


case  where  we  ignore  any  information  from  successfully  dealt 
with  initiating  events.  We  also  look  at  the  expected  times  until 
a  catastrophic  event  under  different  policies,  and  optimize  the 
probability  that  the  system  will  last  at  least  a  fixed  number 
of  time  periods.   Section  4  describes  the  equivalent  continuous 
time  model  and  shows  how  the  discrete  time  results  are  replicated 
if  the  lifetime  of  the  unit  is  exponential  and  the  initiating 
events  occur  according  to  a  Poisson  process.   We  also  investi- 
gate the  optimal  inspection  policy  for  general  lifetime  distribu- 
tions.  Section  5  generalizes  the  discrete  time  model  to  allow 
the  unit  to  be  in  two  'up1  states.   In  certain  cases  the  optimal 
inspection  policy  for  this  model  has  quite  short  inspection 
periods  immediately  after  a  repair,  which  then  lengthen  as 
further  inspections  suggest  the  system  is  in  the  "better"  up 
state. 


2.   CONTINUOUS  TIME  MODEL  WITH  TWO-UP  STATES  AND  SHORT-LONG 
INSPECTION  POLICY 

Assume  the  system  can  be  in  one  of  two  up-states  j  =  1,2 
until  it  fails.   The  two  up-states  are  indistinguishable  upon 
inspection.   After  a  repair  the  system  goes  to  up-state  j  with 
probability  tt  .  and  remains  there  until  it  fails.   After  a  repair 
the  conditional  distribution  of  the  time  to  failure  given  it  is 
in  up-state  j  is  G.,  independent  of  the  past. 

After  a  repair  the  system  is  inspected  and  maintained  at  K 
short  intervals  of  length  S.   If  the  system  is  found  to  be  up 
at  each  of  the  K  short  inspection  intervals,  then  future  inspec- 
tions occur  at  long  intervals  of  length  L  >  S.   If  the  system  is 
found  to  be  down  upon  inspection,  it  is  repaired  and  then  in- 
spected at  K  short  inspection  intervals  again  before  the  long 
inspection  intervals  begin.   If  the  ^system  is  found  to  be  up 
upon  inspection,  routine  maintenance  is  performed.   Given  the 
system  is  in  up-state  j ,  the  conditional  distribution  of  the 
time  to  failure  after  an  inspection  is  F.,  independent  of  the 
past.   Some  reasonable  and  tractable  examples  of  distributions 
F.  and  G.  are  the  exponential,  and  the  exponential  with  a  proba- 
bility atom  at  the  origin  reflecting  hazardous  inspection  or 
faulty  repair. 

Inspection-maintenance  takes  M  units  of  time  and  repair 
takes  R  units  of  time.   Initiating  events  occur  according  to  a 
Poisson  process  with  rate  v.   The  system  is  unable  to  respond 
to  an  initiating  event  during  inspection-maintenance  or  repair. 
A  catastrophic  event  is  said  to  occur  if  an  initiating  event 


occurs  when  the  system  has  failed  or  is  being  inspected, 
maintained,  or  repaired.   Let  T  denote  the  time  of  the  first 
catastrophic  event.   We  will  derive  an  expression  for  the  ex- 
pected value  of  T. 

Let  f(j,k)  =  E.  ,[T]    denote  the  expected  time  to  the  first 
3  /K 

catastrophic  event  given  k  =  0,1,..., K  short  inspection  periods 
have  already  successfully  taken  place  and  the  system  is  in  up- 
state j.   Let  f(j,£)  =  E.  „ [T]  denote  the  expected  time  to  first 
catastrophic  event  given  a  successful  inspection  has  just  taken 
place,  the  next  inspection  period  is  long,  and  the  system  is 
in  up-state  j . 

A  probabilistic  argument  gives  the  following  system  of 
equations;  (F. (S)  =  1  -  F.(S)). 


f(j,0)   =   G. (S)e  VM{S+M+f (j,l) }  (2.1) 

S 
+  G.  (S)   /   (S+u) ve    du 
3     0 


S  S-u+R         _ 

+   /   G.(du)   /       (u+z)ve  v   dz 
0     :      0 


+   /   G.  (du)e  V(S"U+R)  [S+R+      7T.f(j,0)]  ; 
0     3  j=l   D 


for  1  <  k  <  K-l, 


f(j,k)   =   F  (S)e  vM{S+M+f (j,k+l) }  (2.2) 

M 
+  F. (S)   /   (S+u)ve  Vu  du 
3  0 


S  S 

+   /   F.  (du)   /   (u+z) ve  VZ  dz 
0     3  0 


S  2 

+   /   F.  (du)e"V(S_U+R)  [S+R+      7T.f(j,0)]  , 
0  3  j=l  i 


where  f ( j,K)  =  f ( j , £) ; 


f(j,Ji)   =   F .  (L)e  VM[L+M+f  (j,£)  ]  (2.3) 

M 
+  F. (L)   /   (L+u) ve  VU  du 
3  0 


L  L-u+R 

+   /   F. (du)   /      (u+z) ve  VZ  dz 
0     D      0 


+   /   F.  (du)e"v(L-U+R)  [L+R+  I      TT.f(j,0)] 
0     J  j=l   : 


After  some  simplification,  equations  (2.1) -(2. 3)  become 


f(j,0)   =   aQ(j,S)  +  p0(j,S)f  (j,l)  +  c0(j,S)TTf  (0)  ;    (2.4 


for  1  <  k  <  K-l 


f(j,k)   =   a(j,S)  +  p(j,S)f(i,k+l)  +  c(j,S)7rf  (0)  ;     (2.5) 


f(j,A)   =   a(j,L)  +  p(j,L)f (j,£)  +  c(j,L)7Tf(0)  (2.6) 


where 


2 

7Tf(0)    =     I        7T.f(j,0)   ; 

j=l    3 

P0(j,S)   =  G.(S)e~vM    ;  (2.7) 

c 

/•  o\       -v(S+R)   r         vu  _  , ,  .  ,~  ON 

cn(]/S)   =   e  j      e    G.(du)  ;  (2.8) 


aQ(j,S)   =   i  [1  -  PQ(j,0)  -  c0(j,0)]              (2.9) 

S 

+  G. (S)S  +   /  uG. (du)  ; 

:        0  : 


p(j,t)   =   F.(t)e  VM  ;  (2.10) 

c(j,t)   =   e"V(t+R)  f   evu     u)  .  {2mll) 

0        J 

a(j,t)   =  i  [1  -  p(j,t)  -  c(j,t)]  (2.12) 

t 
+  F. (t)t  +   /   u  F. (du)  .  (2.12) 

:        0       3 


In  the  special  case  in  which  F .  has  an  exponential  distri- 
bution with  an  atom  at  the  origin, 


if   t    0  , 

I  -o 

(1-a . )  +  a . [1-e   ^  ]   if   t  >  0 


then 


-6  .t 
p(j,t)   =   a.e  J      e  VM  (2.13) 

c(j,t)   =   e  V^+K>   (i-a  )  +  a   ^-J^[l-e         ]   (2.14) 

J 

a(j,t)   =   ^  [l-p(j,t)  -  c(j,t)]  (2.15) 

,      -6  .t 
+  a.  -F-  [1-e   J  ] 


Solving  equations  (2.4)- (2.6)  recursively  leads  to  the 
following  expression  for  the  expected  time  to  the  first  cata- 
strophic event  given  the  system  has  just  been  repaired 


nf(0,  =    =  (2.16 


where 


NUM   =    I      TT.[a0(j,S)  +  p()(j/S)gN(K-l)  ]  (2.17) 


where 


K-l 


a    fK_n   _   [1-P(j/S)    ]   ,.   »     ,.  S)K-1  a(j,L)  ,   ,    8 

gN(K  1)   -      l-p(j/S)    aU/S)  +  pd,S)     i-p(j/L)  '   ^.isj 


2 

DEN   = 


J      ll^ll-{c0(j,&)+   p0(j,S)gD(K-l)}]  (2.19) 

q  (K-D   =   [l-p(j,S)K"1]    (  K-l    c(j,L)     (2  2Q) 

gD1^  X;  l-p(j,S)    cU#t>;  +Pl],bj      l-p(j,L)     U.^WJ 


EXAMPLE.   The  rate  of  initiating  events  is  v  =  0.1  per  week. 
tt   =  0.9  =  1  -  tt     The  length  of  an  inspection-maintenance 
period  M  is  — '-= —  weeks.   A  repair  period,  R,  is   '   weeks. 


Fj(t)   = 


Gj(t)   = 


if   t  <  0  , 


-6  .t 
(1-OKI)  +  OKI[l-e  3    ]   if   t  >  0 


if   t  <  0 


-6  .t 
(1-OKR)  +  OKR[l-e   3  ]   if   t  >  0 


9  1 

Assume  6,  =  j^q   per  week,  6«  =  ~-  per  week.   Note  that  after  a 

repair  the  conditional  expected  time  to  system  failure  given  the 

system  is  up  is  7rn  •=—  +  tt„  -p—  =  26  weeks.   Thus,  if  after  a 

lo,     z    o ^ 

repair,  no  inspections  are  done,  then  the  expected  time  to  a 
catastrophic  event  is  (OKR) (26)  +  -  =  (OKR) (26)  +  10  weeks. 

An  exploratory  numerical  study  was  conducted  of  the  best 
values  of  S,  L,  and  K  for  various  values  of  OKI,  OKR.   We 
restricted  our  attention  to  the  case  in  which  inter-inspection 
periods  are  in  integer  numbers  of  weeks.   Equations  (2 . 16) - ( 2 . 20) 
were  evaluated  numerically  for  various  parameter  values.   Some 
results  are  summarized  in  Table  1. 
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Table  1 


Expected 


Best 

Time  if 

OKI 

OKR 

Best  S 

Best  K 

Best  L 

TTf  (0) 

no  inspections 

0.9 

0.9 

1 

2 

2 

69.24 

33.4 

0.5 

0.5 

oo 

- 

- 

23 

23 

0.5 

0.9 

9 

1 

1 

38.39 

33.4 

0.9 

0.5 

1 

1 

3  or  4 

50.63 

23 

If  the  quality  of  the  repair  is  better  than  the  quality  of 
inspection  (OKR  >  OKI)  then  it  appears  to  be  better  not  to  in- 
spect often  initially  after  a  repair  but  then  to  inspect  more 
often  as  time  goes  on.   If  OKI  >  OKR  then  it  appears  to  be 
better  to  inspect  soon  after  a  repair  and  if  the  system  is  up 
at  inspection  not  to  inspect  for  a  longer  period  of  time 
thereafter.   If  both  repair  and  inspection  are  of  poor  quality 
then  it  appears  to  be  better  not  to  do  anything.   Note  that  the 
expected  time  to  a  catastrophe  seems  to  be  more  sensitive  to 
OKI  than  to  OKR. 

In  the  remainder  of  the  paper  we  will  study  optimal  inspec- 
tion policies. 
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3.   DISCRETE  TIME,  ONE-UP-STATE,  MARKOV  DECISION  PROCESS  MODELS 
MODEL  1 

In  the  first  model,  the  standby  unit  can  either  be  'up'  or 

'down1,  when  it  is  not  in  operation;  and,  if  n  basic  time  periods 
e.g.,  days,  have  elapsed  since  the  unit  was  installed,  s   is 
the  probability  that  it  will  be  'up'  at  the  next  time  period 
given  that  it  is  'up'  in  this  (the  n   )  time  period.   Once  the 
unit  goes  'down'  it  remains  'down'  until  either  it  is  success- 
fully repaired  or  else  a  catastrophic  initiating  event  occurs. 
Each  time  period,  the  operator  can  inspect  the  unit,  repair  it, 
or  do  nothing.   If  the  inspection  finds  the  unit  is  'up',  no 
repairs  are  made,  but  there  is  a  probability  (1-i)  that  the 
inspection  was  actually  hazardous  or  damaging,  and  so  the  unit 
is  'down'  immediately  after  inspection.   An  inspection  which 
finds  the  unit  up  takes  M  periods,  where  M  need  not  be  integer; 
during  this  period  the  unit  cannot  respond  to  an  initiating  event 
If,  on  inspection,  the  unit  is  found  in  the  down  state,  a  repair 
is  attempted,  which  with  probability  r  will  return  the  unit  to 
the  'up'  state  and  with  probability  (1-r)  leaves  it  in  the  down 
state;  this  takes  a  total  time  of  R  periods  to  perform  (R  >_  M)  ; 
again  the  unit  cannot  respond  to  an  initiating  event  during  this 
period.   If  the  operator  decides  on  a  repair  without  inspection, 
the  unit  is  again  out  of  operation  for  R  periods  and  has  proba- 
bility r  of  being  in  the  'up'  state  immediately  afterwards, 
irrespective  of  whether  it  was  up  or  down  before  the  repair. 
An  initiating  event,  i.e.,  one  that  demands  the  standby 
unit's  services,  occurs  at  random  with  probability  3  each  period, 
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i.e.,  according  to  a  Bernoulli  trials  process,  so  the  times 
between  events  are  independent  and  geometric.   In  this  model  we 
assume  the  operator  is  aware  of  those  initiating  events,  to  which 
the  standby  unit  responded  satisfactorily.   This  implies  the 
unit  was  'up'  at  that  time,  and  although  we  neglect  the  time 
it  was  in  operation,  we  say  there  is  a  (1-c)  chance  that  its  use  will 
have  caused  it  to  go  down  by  the  end  of  the  period.   So  if  it  was  used 
the  n    period  after  the  unit  was  installed,  there  is  a  probability 
c,    that  it  will  be  'up'  at  the  next  period.   (If  c  =  1,  use  is 
not  hazardous.)   If  the  standby  system  is  down  or  is  being 
inspected  or  repaired  when  an  initiating  event  occurs,  a  cata- 
strophic event  occurs.   The  objective  is  to  maximize  the  expected 
number  of  periods  until  a  catastrophic  event  occurs. 

The  situation  described  can  be  treated  as  an  infinite-state 
Markov  decision  process.   The  state  space  is  describable  as 
S  =  {  (p,n)  ,  0  <_   p  <_  1,  n  =  1,2,...}  where  p  is  our  belief  that 
the  unit  is  'up*  this  period,  and  n  is  the  number  of  periods 
since  the  standby  unit  was  installed.   There  are  three  actions 
open  to  us  at  each  state — do  nothing,  inspect  or  repair.   Let 
V(p,n)  be  the  maximum  expected  number  of  periods  until  a  cata- 
strophic event,  given  that  this  is  the  n    period  since  installa- 
tion, and  p  is  our  belief  at  this  time  that  the  unit  is  'up'. 
Standard  dynamic  programming  arguments  [14]  show  that  V(p,n) 
satisfies  the  optimality  equation. 


V 


(p,n)   =   max{W1 (p,n) ,  W2(p,n),  W3(p,n)}  (3.1) 
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where : 


W1(p/n)   =   1  +  (l-3)V(snp,n+l)  +  SpV(c,n+l) 


W2(p,n)   =   p[(l-(l-e)M)/e  +  (l-3)MV(i,n+M) ] 


+  (1-p)  [(l-(l-3)R)/e  +  (l-3)P'V(r,n+R)] 


W3(p/n)   =   (l-(l-g)R)/3  +  (l-8)RV(r,n+R) 


Note  that 


(l-(l-3)M)/3  =  3  +  23(1-3)  +  33d-3)2  +  ...  +  M(1-3)M  1 


is  the  expected  number  of  periods  to  pass,  up  to  a  maximum  of 
M,  until  an  initiating  event  occurs.   W.(p,n)  represents  the 
payoff  from  an  action;  for  example  W, (p,n)  corresponds  to  doing 
nothing,  where  with  probability  (1-3)  no  demand  occurs,  while 
with  probability  3p  an  initiating  event  is  successfully  dealt 
with  and  with  probability  (1-p) 3  a.  catastrophic  event  occurs. 
(3.1)  is  an  example  of  Denardo ' s  contraction  operator  approach 
to  dynamic  programming  [4] ,  and  hence  the  optimal  policy  is  inde- 
pendent of  the  past  history  of  the  system  and  consists  of 
inspecting  in  state  (p,n)  if  W„(p,n)  >  max{W, (p,n),  W^(p,n)} 
repairing  if  W~(p,n)  >  max{W, (p,n) ,  W?(p,n) },  otherwise  doing 
nothing. 

As  there  is  a  probability  3(1  -  max{s,  })  of  a  catastrophic 

k    K 
event  within  two  periods  from  any  state  and  under  any  policy, 


14 


we  have 


1/3   <   V(p,n)      2/3(1  -max(sj  )  .      (3.2) 

k    K 


It  is  easier  to  work  with  V(p,n)  =  V(p/n)  -  1/3,  which  is  the 
expected  extra  time  until  a  catastrophic  event  because  there 
is  a  standby  unit.   (3.1)  then  becomes 

V(p,n)   =   max{W1(p/n),  W  (p,n),  W3(p/n)}  (3.3) 

where: 

Wx(pfn)   =   p  +  (1-3) V(snp,n+1)  +  3pV(c,n+l) 


W2(p,n)   =   p(l-3)MV(i,n+M)  +  ( 1-p)  (1-3 )  ^(r ,  n+R) 


W3(p,n)   =   (l-3)RV(r/n+R) 


Lemma  3.1. 

If  s   are  non-increasing  in  n  then  V(p,n)  is  convex  and 
nondecreasing  in  p,  and  non-increasing  in  n. 


Proof.   Apply  value  iteration  to  solve  (3.3);  the  iterates 

V  (p,n)  satisfy 
m  ^  2 


p  +  (l-B)V  (s  p,n+l)  +  3pV  (c,n+l) 

^  m   n^  m 

V   , (p,n)   =   max  <p(l-3)M  V Ji,n+H)  +  (l-pMl-gF  V(r,n+R) 
m+l  \  m  m 

(1-3)R  Vm(r/n+R)   .  (3.4) 
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Let  Vn(p,n)  =  0  for  all  p  and  n,  which  is  convex  and  non- 
decreasing  in  p  and  non-increasing  in  n.   Since  the  sum  of 
convex  functions,  and  the  maximum  of  convex  functions  is  convex, 

if  V  (p,n)  is  convex  for  all  p  and  n  so  is  V  tl (p,n) .   Thus 
m  ^  ^  m+ 1  * 

by  induction  V  (p,n)  is  convex  in  p  and  since  by  [14 ] , 

V  (.,.)  converges  to  V(./.)  the  solution  of  (3.3),  this  limit 

function  is  also  convex  in  p. 

Again  notice  that  if  V  (p,n)  is  non-decreasing  in  p  for  all 
n,  so  is  p  +  (1-3)V  (s  p,n+l)  +  3pVm(c,n+l)  since  V  (.,.)  >  0 
and  also  max{p(  1-3)  MVm(i,n+M)  +  (l-p)(l-3)RV  (r  ,n+R  )  ,  (1-3)R  V  (r,n-tf*)> 
is  non-decreasing  in  p.   Hence  V   , (p,n) ,  the  maximum  of  these 
two  non-decreasing  functions,  is  non-decreasing  and  the  induction 
step  goes  through.   In  the  limit  as  m  -*■  «>  this  proves  V(p,n) 
is  non-decreasing  in  p. 

For  the  dependence  of  V(p,n)  on  n,  we  again  use  induction 
in  the  iterates  V  (p,n) :   notice  that  (3.4)  implies 

Vm+l(P'n)  "  WP'n+1)   >  j 

{(1-3)(V  (s  p,n+l)  -  V  (s  J_1p,n+2))  +  3p(V  (c,n+l)  -  Vm(cfn+2))( 
m  n^  m  n+11^'    '    ^     m  m        ' 

pd-3)  (Vm(i,n+M)  -  Vm(i,n+i+M))  +  ( 1-p)  ( 1-3  )  K(Vm  (r  ,n4-R) 

-  V  (r,n+l+R) )  , 
m 

(l-3)R(Vm(r,n4R)  -  Vm(r,n+1+R)) }.  (3.5) 


Assume  V  (p,n)  >  V  (p,n+l)  for  all  p  and  n,  then  the  fact 

m  ^'         —     m  ^'  ^ 

V  (p,n)  is  non-decreasing  in  p  means  that,  for  all  p, 
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Vm(snP/n+l)  -  Vm(sn+1p,n+2)       (Vm( snp,n+l)  -  Vm (sn+1p, n+1) ) 

+  (^m(sn+lp'n+1)  "  ^m(sn+lp'n+2) }   -   °  '        (3*6) 

Hence  (3.5)  gives  vm+]_(P/n)  _  v  +1(p#n+l)  >  0  for  all  p  and  n, 
and  the  induction  hypothesis  holds.   Thus,  the  limit  function 
V(p,n)  is  also  non-increasing  in  n. 

These  results  help  to  describe  the  optimal  policy. 

Theorem  3 . 1 

The  optimal  policy  is  given  by  a  set  of  numbers  p  , 
n  =  1,2,...  where,  n  periods  after  installing  the  standby 
system,  one  does  nothing  in  state  (p,n)  if  p  >  p  ; 
inspects  if  p  <  p*  and  (1-3)^(1,11)      (l-B)RV(r,n)  ;  and 
repairs  if  p  <  p*  and  (1-3) MV(i ,n)  <  (1-3)RV (r ,n) .   Notice  if 
i  >  r,  then  one  never  repairs  as  (1-3)  v(i,n)  >  (1-3)  V(r,n) 
for  all  n. 

Proof.   Notice  that  if  (l-3)MV(i,n)  >  (1-3) RV(r ,n) ,  then  W2(pfn)  >W3(p,n) 

for  all  p;  otherwise  W  (p,n)  >_  W„(p,n).   Now  look  at 

{p|W,  (p,n)  <_  max  {W.(p,n)}},  which  is  the  set  of  states  (p,n) 

1        i=2,3   x 
where  it  is  not  best  to  do  nothing.   Since  both  W„(p,n)  and 

W..(p,n)  are  linear  in  p  and  V(p,n)  is  convex,  we  get  for  any 

p,  and  p?  in  the  above  region  and  any  A,  0  £  X  <_   1. 

W. (Ap1+(1-A)p2,n)   =   AWi(p1,n)+(l-A)Wi(p2,n) 

=   AV(p1,n)+(l-A)V(p2,n)  >  V( Apx+ (1- A ) p2 ,n)         (3.7) 
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where  i  =  2  or  3  depending  on  which  is  the  maximum.   Hence  (3.7) 
implies   max   W.  ( Ap,+  (1-Ap2)  •>  W,  ( Ap,  +  (1-A)  p2)  and  so  the  region 
where  it  is  not  best  to  do  nothing  is  convex. 
From  (2.3)  we  have 


V(0,n)   =   max{ (l-3)V(0,n+l) ,  ( 1-3) RV (r ,n+R) }  .        (3.8) 


If  it  were  best  to  do  nothing  at  p  =  0  ,  this  would  imply 
V(0,n)  =  (1-3) V(0,n+1) ,  which  contradicts  V(p,n)  is  non- 
increasing  in  n.   Hence  (0,n)  is  in  the  convex  region  where 
it  is  not  best  to  do  nothing.   Let  p  be  the  maximum  value  of  p 
in  this  region  and  the  result  holds. 

In  fact  the  model  can  be  rewritten  so  that  the  state  space 
is  countable,  since  not  all  possible  values  of  p  are  possible. 
Let  S  =  { (m,x,n) ,m  =  0,1,2,...,  x  =  i,  r  or  c,  n  =  1,2,3} 
where  (m,x,n)  is  the  state  when  the  unit  is  n  periods  since 
installation  and  m  periods  since  the  end  of  the  last  inspection, 
repair  or  successful  response  to  an  initiating  event;  x  =  i  if 
this  last  occurrence  was  an  inspection  that  found  it  up;  x  =  r 
if  it  was  a  repair  and  x  =  c,  if  it  was  a  successfully  dealt 

with  initiating  event.   The  probability  p  that  the  unit  is  up 

m 
in  this  state  is  p(m,x,n)  =  x   n   s   ,  and  so  the  optimality 

k=l  n~k 
equation  (3.3)  becomes 
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p(m,x,n)  +  (1-6) V(m+l,x,n+l) 


+  3p (m,x,n) V(0,c,n+1)  ; 


V(m,x,n)  =      max  <.  p  (m,x  ,n)  ( 1-3)V(0  ,  i  ,n+M) 


(3.9) 


+  (l-p(m,x,n)  )  (1-3) KV(0,r,n-fR)  ; 


R 


V 


(1-3)  V(0,r,n+R) 


and  the  optimal  policy  of  Theorem  3.1  can  be  reinterpreted. 

Corollary  3.1; 

If,  at  n  periods  after  installation,  an  initiating  event 
is  successfully  dealt  with,  inspect  or  repair  next  in  T  (n) 
periods  unless  there  is  another  initiating  event  before  then; 
if  at  n  periods  after  installation,  the  unit  has  just  been  found 
to  be  'up'  on  inspection,  inspect  or  repair  next  in  T. (n) 
periods  unless  an  initiating  event  occurs;  if  at  n  periods  after 
installation  the  unit  has  just  finished  a  repair,  then  inspect 
or  repair  in  T  (n)  periods  unless  a  prior  initiating  event 
occurs.   If  i  >  r  one  always  inspects,  otherwise  the  repair 
or  inspect  decision  depends  on  the  number  of  periods  since 
installation. 

Proof.   This  is  just  a  matter  of  pointing  out  that 


T  (n)   =   min{k|cs  s   ,«-«s  ,.   <   p_ , , }  , 
c  '   n  n+1     n+k      *n+k 


T±(n) 


min{k|isnsn+1...sn+k   <   pn+k}  , 
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* 
T  (n)   =  min{k  rs  s  1,«--s  ,,   <   p  ,.}. 
r         ,    '   n  n+1     n+k     ^n+k 
k 


Notice  that  T  (n) ,  T. (n) ,  T  (n)  reflects  the  ordering  of 

w  -L  Jl 

c,  i  and  r,  so  if  c  >  i    r  then  T  (n)  >_  T.  (n)  >  T|n)  ,  etc. 
The  dependence  of  this  policy  on  n  follows  because  the 
failure  rate  (1-s  )  is  age-dependent.   We  would  expect  that  if 
s   decreases  with  n,  and  consequently  the  failure  rate  is  in- 
creasing, then  T  (n) ,  T. (n)  and  T  (n)  will  also  be  non-increasing 

w  -L  XT 

in  n.   This  reflects  the  fact  that  in  the  long  run,  the  aging 
of  the  unit  will  lead  to  more  frequent  inspections.   At  the 
moment  we  are  more  interested  in  the  effect  of  inspections  and 
repair  before  aging  starts  to  play  a  part.   The  interesting 
decision  to  replace  an  aging  unit  will  not  be  analyzed  at  this 
time.   From  now  on,  assume  that  the  failure  rate  is  constant, 
which  leads  to  the  following  simplification  of  Model  1. 

Model  2 

Assume  s   =  s  for  all  n  in  Model  1,  and  c  =  i.   This  corres- 
n 

ponds  to  thinking  of  an  initiating  event  successfully  dealt 
with  as  an  inspection  which  takes  zero  time.   The  state  space 
becomes  S  =  { (m,x) ,m  =  0,1,2,  x  =  i,  or  r},  the  optimality 
equation  (3.9)  becomes 


xsm  +  (l-3)V(m+l,x)  +  gxsm  V(0,i)  ; 

r  xsm(l-3)M  V(0,i)  +  (l-xsm) (1-3)R  V(0,r);  (3.10) 

(1-3) R  V(0,r)  . 
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and  the  optimal  policy  is  either  of  the  form  tt.(T.,T  )  or 
tt  (T.  ,T  );  tt.(T.,T  )  means  inspect  T.  periods  after  a  success- 
ful response  to  an  initiating  event  and  T.  periods  after  the 
end  of  an  inspection  or  T   periods  after  the  end  of  a  repair, 
unless  another  initiating  event  occurs,  whereupon  inspect  if 
T.  more  periods  elapse  without  another  initiating  event. 
tt  (T.,T  )  means  repair  T.  periods  after  a  successf ully-dealt- 
with  initiating  event,  or  T   periods  after  last  repair,  unless 
another  initiating  event,  or  T   periods  after  last  repair, 
unless  another  initiating  event  occurs.   Notice  that  one  either 
always  inspects  or  always  repairs  depending  on  the  values  of 
(1-3)^(0,1)  and  (l-3)RV(0,r)  . 

Although  the  state  space  is  infinite  we  can  apply  variants 
of  policy  iteration  and  value  iteration  which  solve  the  Markov 
decision  process  to  find  the  optimal  policy  and  optimal  expected 
time  to  a  catastrophic  event.   For  any  policy  tt  .  (T.,T  )  there 

are  only  T.  +  T   +2  states  the  unit  can  be  in.   So  for,  any 
■*   1     r  J 

expected  policy  we  can  calculate  the  corresponding  expected 
time.   Since  the  problem  is  equivalent  to  one  with  discount 
factor  (1-3 (1-s)),  we  can  apply  the  bounds  in  White  [22]  to 
find  a  finite  state  approximation,  whose  value  is  within  any 
prescribed  amount  of  the  optimal  value.   These  bounds  tell  us 
how  many  states  (m,x)  we  need  to  consider.   The  results  given 
in  Table  2  are  the  optimal  policy  and  optimal  expected  time  for 
different  values  of  3,  i,  r,  s,  M  and  R ,  together  with  the 
expected  times  under  other  policies.   The  numbers  we  have  chosen 
reflect  an  underlying  model,  in  which  inspections  can  be  scheduled 
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at  discrete  times,  say  at  multiples  of  a  week.   However,  a 
repair  or  inspection  takes  only  a  fraction  of  this  time.   Al- 
though our  theory  was  worked  out  for  integer  inspection  and 
repair  times,  we  take  the  same  formula  to  approximate  non- 
integer  times.   The  inspection  policy  tt.(1,0)  means  inspect 
one  period  after  last  inspection  or  last  initiating  event  and 
immediately  after  a  repair,  while  tt  (0,100+)  means  repair 
immediately  after  any  initiating  event  or  at  least  100  periods 
(100+)  after  a  repair. 

Notice  the  optimal  policy  is  almost  insensitive  to  whether 
3  =  0.05  or  0.01  and  the  expected  time  to  a  catastrophic  event 
is  affected  more  by  increases  in  i  than  r  or  even  s.   The 
policy  tt  .  (n,0)  to  inspect  immediately  after  a  repair  is  optimal 
if  the  probability  of  a  repair  not  being  effective  is  quite 
high,  say  0.4.   Similarly,  the  model  suggests  one  should  not 
inspect  i.e.,  tt  (.,.)  if  inspection  is  more  hazardous  than 
repair,  i  <  r. 

MODEL  3. 

We  might  want  to  change  our  criterion  from  maximizing  ex- 
pected time  until  a  catastrophic  event  to  maximizing  the  proba- 
bility that  the  system  lasts  at  least  n  periods  until  a 
catastrophic  event.   This  might  be  the  case  if  the  unit  is  to 
be  completely  replaced  after  n  periods.   If  we  apply  this 
criterion  to  Model  2,  P  (p)  the  probability  that  the  system  lasts 
at  least  n  periods  before  a  catastrophic  event,  given  we  believe 
it  is  'up'  at  present  with  probability  p,  satisfies  the 
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optimality  equation 

PQ(p)   =   1    for  all  p. 

(l-3)Pn(sp)  +  gp  Pn(i) 
Pn+1(p)   =  max<;P(l-3)M  Pn+1_R(i)  +  (l-P)(l-3)NPn+1_^(r) 

(1-3)^  Pn+1^N(r)  (3.11) 

where  M  =  min(M,n+l) ,  N  =  min(R,n+l) .   The  optimal  policy  is 
again  of  a  control-limit  type. 

Theorem  3.2. 

The  optimal  policy  to  maximize  the  probability  of  lasting 
n  periods  is  given  by  the  sequence  p, ,p7,...p  /  where  with  k 
periods  to  go,  do  nothing  if  p  >  p,  ,  inspect  or  repair  if 

p  <  p*;  repair  if  (l-3)FhM  pn+1_N(r)  1   Pn+1-M(i)'  and  insPect 
otherwise. 

Proof.   As  in  Theorem  3.1,  prove  by  induction  that  P  (p)  is 
convex  and  non-decreasing  in  p  and  non-increasing  in  n.   The 
convexity  of  P  (p)  and  the  linearity  of  the  second  two  terms 
in  the  maximization  in  (3.11)  then  gives  the  result. 

If  the  state  space  is  changed  to  S  =  {(m,x),  m  =  0,l,2,...,x  =  i 
or  r),  by  noting  p  =  xs   at  (m,x)  ,  the  obvious  change  occurs  in 
the  optimal  policy.   In  Table  3  we  compare  the  maximum 

chance  of  lasting  n  periods  before  a  catastrophic  event  for 

* 

n  =  10 ,  50  and  200  with  the  same  chance  under  the  policy  it 

that  maximizes  the  expected  time  to  a  catastrophic  failure. 
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These  figures  are  similar  to  those  given  for  Model  2  except 
that  the  length  of  period  is  1/10  of  that  there.   So  we  can 
think  of  the  probabilities  as  those  of  lasting  10,  50  or  100 
weeks  without  a  catastrophic  failure.   The  optimal  policy  for 
maximizing  expected  time  until  failure  does  very  well  in  almost 
all  cases. 

Model  4 . 

Suppose  any  information  derived  from  having  successfully 
dealt  with  initiating  events,  as  in  Model  2,  were  ignored; 
what  changes  would  occur?   We  can  no  longer  model  this  as  a 
Markov  decision  process  period  by  period  since  in  these  we  cannot 
ignore  information  we  know.   However,  we  can  construct  a  renewal 
theory  model,  for  each  end  of  inspection  or  end  of  repair  is 
a  type  of  renewal  point.   Thus  we  can  define  V. ,  V  as  the  maxi- 
mum expected  time  to  a  catastrophic  event  starting  immediately 
after  a  repair  V  or  an  inspection  V. .   The  rest  of  the  model 
is  the  same  as  Model  2,  with  i,  r,  s,  M,  R  having  the  same 
meaning  as  there.   The  optimality  equation  is  then 


T . 

V.   =  max(L.(T.)  +  is  1 ( (1- (1-3)M) /3  +  (1-3)M  V.) 
i      T    i   i  i 

i 
+  Pi(T.)  ((l-(l-3)R/3)  +  (1-B)R  vr)  } 


T 

V   =    max  /  L  (T  )  +  rs  r  (  ( 1-  ( 1-3) M)  /3  +  d~3)M  V.) 
r      m   r,  I   r   r  l 


T  ,W 
r   r 


+  pr(Tr) ((l-(l-3)R)/3  +  (l-3)RVr); 

W 
Lr(Wr)  +  (rs  r  +  pr(Wr))((l-(l-3)  )/3 

+  d-3)R)Vr) 
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T~2     '  ■ 

where  L  (T)  =  T  -   £  [1-ps1]  [1- ( 1-3 )     1]  is  the  expected  number 

p  i=0 

of  periods,  up  to  a  maximum  of  T  until  a  catastrophic  event 

occurs,  if  p  is  the  probability  the  unit  is  up  at  the  start 

T-l 
of  the  first  period;  and  p  (T)  =  (l-xsT)  -  I    [  3  ( 1-3 )  H  [l-xsT   ~^| 

x  i=0 

is  the  probability  that  after  T  periods  the  unit  is  down  but 

no  catastrophic  event  has  occurred  given  that  initially  it  was 

up  with  probability  x  and  down  with  probability  1-x.   Again  it 

is  easier  to  work  with  V   =  V   -  1/3  and  the  arguments  of  Markov 

xx  ^ 

renewal  programming  [7]  r  show  that  the  optimal  policy  is  either 

tt.(T.,T  ),  i.e.,  inspect  T.  after  last  inspection  and  T   after 
1   1   r        *  v  -l  r  r 

last  repair,  or  tt  (W  ),  i.e.,  repair  W   after  last  repair. 

Using  (3.12)  we  can  calculate  V.,  V   under  these  policies.   For 

tt  .  (T.  ,T  ) 
1   1   r 

.      r(l-s  r)  (l-(l-3)   is  1)+i.(l-s  x)   (1-3)   rs  r 

r  T .  T 

(1-s)  [(1-(1-3)M  is  x)  (1-(1-3)R  pr(Tr))-(l-3)M+R  pi(Ti)rs  r] 

(3.13) 


while  under  tt  (W  ) 
r   r 


W 

v    =   r(1"S ]- .      (3.14) 

r      (1-s)  (l-(l-3)Rd-pr(Wr))) 


We  calculate  the  optimal  policy  for  the  examples  we  did  in 
Model  2,  and  so  it  is  useful  to  compare  the  results  with  those 
given  there.   The  results  can  be  found  in  Table  4. 
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There  are  no  great  changes  in  the  maximum  time  until  a 
catastrophic  event.   Notice  that  there  are  examples  where  model 
5  has  a  longer  expected  time.   This  may  seem  strange  at  first, 
since  in  Model  5,  we  are  ignoring  information — the  occurrence 
of  a  successfully  dealt  with  initiating  event—which  we  use  in 
Model  2.   However  to  counterbalance  this,  in  Model  5,  it  is 
implicit  that  after  a  successfully  dealt  with  initiating  event, 
the  stand-by  system  is  bound  to  be  up,  while  in  Model  2,  it  is 
only  up  with  probability  i.   This  also  explains  the  difference 
in  policy  for  the  fourth  example.   Since  repair  and  inspection 
are  so  bad,  we  do  nothing  to  interfere  with  it  under  Model  5, 
but  in  Model  2  because  after  each  successfully  dealt  with 
initiating  event  there  is  only  a  .5  chance  it  is  up,  we  must 
keep  inspecting  it  to  see  if  this  has  occurred.   Otherwise  the 
only  difference  in  policies  is  that  the  inspection  intervals  are 
slightly  longer  in  Model  5  than  in  Model  2. 
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4.   CONTINUOUS  TIME  MODEL  WITH  ONE  UP  STATE 

In  this  section  we  look  at  the  continuous  time  analogue  of 
the  standby  unit  model  described  in  Section  3.   Again,  the 
standby  unit  can  be  either  'up'  or  'down1,  and  remains  down 
either  until  it  is  inspected  and  repaired,  or  until  a  catastrophic 
initiating  event  occurs.   An  inspection  takes  a  time  of  M,  and 
if  the  unit  works  on  inspection,  nothing  is  done,  and  the  life- 
time of  the  unit  thereafter  is  given  by  the  distribution  function 
F. (•) •   The  repair  of  a  unit,  found  to  be  'down*  on  inspection, 
takes,  altogether  with  the  inspection,  a  time  of  R  and  the 

lifetime  distribution  function  thereafter  is  F  (•)•   (The  discrete 

r 

time  models  have  distribution  functions  corresponding  to  a  point 
mass  at  zero  together  with  a  geometric  distribution.)   The  times 
of  the  initiating  events  are  given  by  a  Poisson  process  with 
parameter  v,  (so  average  inter-initiating  event  time  is  v   ). 
Again,  we  think  of  an  initiating  event  that  finds  the  unit  up 
as  the  equivalent  of  an  inspection.   The  problem  is  to  find  the 
times  between  inspections  and  between  a  repair  and  the  next 
inspection  which  maximizes  the  expected  time  until  a  catastrophic 
event. 

From  the  work  of  Doshi  [5]  on  continuous  time  Markov  deci- 
sion processes,  it  follows  that  the  optimal  policy  has  a  deterministic 

time  T.  between  inspections  and  a  deterministic  time  T  ,  between 
l  r  r 

a  repair  and  the  next  inspection.   Moreover,  if  V.,  (V  )  are  the 
maximum  expected  time  to  a  catastrophic  event  starting  after 
an  inspection  (repair) ,  [5]  implies  V.  and  V   satisfy  the 
optimality  equation: 

30 


T 

x         _  -vT  -vT 

V         =         sup{    /  ve  (t+.F    (t)V.)dt+T    e      X  +      [e  F    (T    ) 

T    >  0     0  XI  x 

x— 

M  -vT  N 

(    /    tve-vtdt+Me-vM  +  e-vM   V.)]+e        x   F    (T    )  (    /    tve"vtdt 
0  1  x      X      0 

+   Re~vR+    e"vRVr)}  (4.1) 


where  F(t)  =  1  -  F(t)  and  x  =  i  or  r.   The  T.  and  T   that 

l       r 

actually  maximize  the  R.H.S.  of  (4.1)  are  the  optimal  inspec- 
tion times.   Again,  it  is  simpler  to  work  with  V   =  V   -  1/v , 
which  is  the  improvement  in  expected  time  until  a  catastrophic 
event  when  there  is  a  standby  system,  over  when  there  is  no 
standby  system.   If  V.(T.,T  ),  V  (T.,T  )  are  these  improvements 
starting  from  an  inspection  and  from  a  repair,  when  inter- 
inspection  time  is  T.  and  T   is  the  time  from  repair  to  an 
£-  l       r 

inspection,  we  get  by  rearranging  (4.1)  that 

T 
x  -vT 

V  (T.,T  )   =    /    e"VtF  (t)dt  +  V.(T.,T  )[e    Xe  vM  F  (T  ) 
x   i'  r       nJ  x  l   l   r  xx 

Tx  ~  _   -vT 

+   /    ve  Vt  F(t)dt]  +V  (T.,T  )e  VRe   ^CT)  (4.2) 
_.'  x         rir  js.x 

Solving  the  system  of  equations  (4.2)  we  get 


V.(T.,T  )   =   A(T.,T  )/C(T.,T  );  Vy.(T  ,T  )  =  B(T  ,T  )/C(T  ,T 
lir  ir      ir     rir        ±      l  ±      l 

(4.3) 
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(4.4) 


where 

T 
-v  (r+T  )_  i 

A(Ti,Tr)   =   (1-e       rFr(Tr))   /   eVV(t)dt 

-v(R+T  )_         Tr     . 
+  e         F. (T.)   /    e  vtF(t)dt  . 

0 

-v(M+T.)_  Tr    ... 

B(T.,T  )   =   (1-e       1  F.(T.))   /    e  VtF  (t)dt 
1   r  l   l    a  r 

-v(M+T  )_         Ti   _ 
+  e       r  VV   /    e  VtF  (t)dt  .        (4.5) 

0  x 


-v(M+T.)_  -v(R+T  ) 

C(T.,T  )   =   1  -  e       1  F.(T.)  -  e       r  F  (T  )  + 
i'  r  li  r   r 


-v(M+R+T.+T  )  -v(R+T  )  i 


[F  (T)  -  F.  (T.)]-  (1-e       r  F  (T  ))   /    v'VV(t)d1 

X.      -L  JL      J_  -L      J.        *-.  -L 


-v(T.+R)  r 

e     1    F.  (T)   /    ve  VrF  (t)dt  .  (4.6) 

i  i  0  r 


If  there  are  optimal  finite  inspection  intervals  T.,  T  ,  they 
must  satisfy  for  x  =  i  and  r. 


A'(T.,T  )/A(T.,T  )   =   B*(T.,T  )/B(T.,T  ] 
x   i   r     i   r       x   i   r      l   r 


=   C'(T .,T  )/C(T  ,T  )  (4.7) 

2C     A.  JL  J_      XT 


where 


A!   =   3A/9T.   and   A'   =   9A/9T  ,  etc, 
l  l         r  r 
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In  the  special  case  where  the  extra  time  for  a  repair  is 
zero  and  the  lifetime  of  the  unit  is  the  same  whether  an  in- 
spection or  a  repair  has  just  taken  place,  we  can  show  that 
the  optimal  inspection  times  are  finite.   In  this  case 

Vi  =  Vr  =  V'  M  =  R/  Fi(,)  =  Fr<")  =  F(*)  and  T.  =  T   =  T, 
so  (4.3)  becomes 

V(T)   =   A(T)/C(T)  (4.8) 

where 

T 
A(T)   =    /   e~vtF(t)dt  (4.9) 

0 


C(T)   =   1  -  e"v(M+T)  -   /T  ve-vtF(t)dt  (4.10) 

0 


Lemma  4.1. 

Optimal  inspection  time  T*  is  finite  and 
V(T*)  =  F(T*)/v(eVM  -  F(T*)). 


Proof 

At  a  local  maximum  or  minimum  V (T)  =  0  which  implies 
h(T)  =  A'(T)C(T)  -  C'(T)A(T)  =  0  since  C(T)2  >  0,  where 


h(T)   =   e-vT[F(T)(l-e-v(M+T))-  ve"vM  jT   e~vtF(t)dt]     ; 

0 

(4.11) 

—  vT 
h(0)  is  positive  and  though  h(°°)  =  0  notice  that  h(T)  =  e    g(T) 

and  as  T  ■*  °°,  g(T)  <  0.   This  shows  that  T  =  °°  is  a  minimum 
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turning  point  and  that  there  is  a  finite  turning  point  which 
is  a  maximum. 

We  could  repeat  the  whole  analysis  for  the  continuous  time 
analogue  of  the  model  where  we  ignore  successf ully-dealt-with 
initiating  events,  or  at  least  do  not  consider  them  inspec- 
tions.  Using  the  notation  of  Model  1,  the  optimal  values  V. 

and  V   satisfy 

r       ■* 

T 

V        =         sup{    /         tdt      j      f    (ujve^      UJdu   +   F    (T    )  [T     +     /    tve        dt 
XT>0    0  0  x      x         x      Q 

x— 

m    -  m  Tx        -v(T  -t)  R     _ 

+  Me  VM  +  e  VMV.]  +  (  /    f  (t)e        dt)  [T   +   /   tve  vtdt 
1      0     x  x    0 

+Re"v?  +  e"vRVr] .  (4.12) 

The  same  analysis  that  led  to  (4.7)  can  be  applied  to  (4.12)  to 

find  the  optimal  T.  and  T  .   There  is  a  difference  in  the 
^       l      r 

special  case  when  M=R,  F.(-)  =F  (•)  =F(-)/  T.  =T   =T 

and  V.  =  V   =  V  where  V  =  V  -  1/v . 
l     r 

Now 


V(T)   =   D(T)/K(T)  (4.13) 


where 


T 
D(T)   =    /   F(u)du  (4.14) 

0 


and 
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K(T)   =   1  -  e  V(M+T) (1  +  v   /T  F(u)eVUdu)  (4.15) 

0 


Lemma  4.2. 

In  this  special  case  of  Model  2,  a  sufficient  condition  for 
S*,  the  optimal  inspection  interval  to  be  finite  is  that 


r(-)   > _  (4.16) 

1  +  vye 


where  r(«>)  =  £im  f  (s)/F(s)  and  y  =   /   tf(t)dt  <  <*>  is  the 

S->oo  0 

expected  lifetime. 


Proof . 

~  2 

At  a  local  maximum  or  minimum  of  V(T)  ,  V(T)  =  h(T)/K(T)   =  0 

where 


h(T)   =   F(T)(l-e-v(M+T)  -e"v(M+T)  v   /TF(u)evudu) 

0 


T  T 

-  (  /   F(u)du)ve~v(M+T) (  /   f (u)eVUdu  +  F(0)))        (4.17) 
0  0 

2 
Since  K(T)   >  0,  the  condition  V'(T)  =  0  reduces  to  h(T)  =  0. 

Notice  that  h(0)  =  F(0)  [l-e~vM]  >  0  but  h  («>)  =  0.   Thus  to 
insure  the  maximum  is  not  at  T  =  °°,  we  must  show  h1  (T)  is  posi- 
tive as  T  tends  to  infinity.   Differentiating  h  with  respect 
to  T,  it  follows  that  as  T  tends  to  infinity 


m      ?  2  -vM 

h'(T)  -►  -r(oo)  (1  +  yve  VM)  +  v  ye  VM(vb-l)  +  ^— :  (4.18) 
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where 


r(°°)   =   £im  f(T)/F(T)  =  lim  r(T),   a  =  £im  evTF(T)  , 
T-»-oo  T^-°°  T-*00 


and 


T 
b   =   £im  (  /   F(y)evydy)/F(T)evT  .  (4.19) 

T+oo    o 

T 
If  F(T)e    =  exp(-   /   (r(t)-v)dt)  -*•  c  as  T  •*  «  then  b  ->  °° 

0 
and  h*(°°)  is  positive;  this  certainly  occurs  if  r(°°)  >  v.   if 

—     vT  ^      i 

F(T)e    -*■  °°  as  T  -*  °° ,  then  L'Hopitals  Rule  says 


Thus 


b   =   £im  F(^)e =  =   U-  •  (4.20) 

T.00  vF(T)evT  -  f(T)eVT      V"r(oo) 


h'  (T)   ->   -r(°°)  (1  +  e  VMvy)  +  v2ye  vM   r(7}  v   •    (4.21) 

r-  /      i-      v_ r(°°) 


Y)T 

Since  we  are  assuming  F(T)e    ■+  »  we  have  r(°°)  £  v.   If 

r(oo)  <  v  then  on  checking  when  (4.21)  is  positive  we  get  (4.16). 

Finally  if  r(~)  =  v,  then  b  =  °°  and  h  '  (T)  is  still  positive  at  T  = 

As  an  example  suppose  F(t)  =  we   ,  t  >  0  so  the  unit  has 

exponential  lifetime  with  a  probability  1-w  of  instantaneous 

failure,  then  the  optimal  inspection  time  T  satisfies 

-(v+A)TrA    .   ,_  , . .  -vM  -AT  ,   ,  ,,x  -(v+A)T  ,    -vM  AT,      n 
e        [  A-vw)  -  (2v+A)  e    e    +v(w+l)e        +  ve    e   ]   =   0 

(4.22) 
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and  the  condition  (4.16)  that  guarantees  a  finite  solution  to 
this  equation  is  A  >  v(l  -  we    ) . 
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5.   TWO- UP STATE  MODEL 
Model  7 

We  extend  Model  2  of  Section  3  to  allow  the  unit  to  be  in 
either  one  of  two  different  up  states:   1-up  and  2-up,  which 
have  different  failure  rates.   Let  s.,  i  =  1,2  be  the  proba- 
bility of  remaining  in  state  i  next  period  given  that  it  is  in 
state  i  this  period,  and  1-s .  is  the  probability  it  will  fail 
in  the  next  period.   This  model  is  intended  to  describe  the 
situation  in  which  a  repair  might  only  correct  minor  faults 
that  caused  the  failure  and  not  the  underlying  problem,  which 
caused  and  will  continue  to  cause  these  faults.   We  take  as  our 
state  space  S  =  ip,g)   |0  £  p    1,  0  <_  g  <_  °°},  where  p  is  the 
belief  that  the  unit  is  up,  and  g  is  the  ratio  of  the  probability 
the  unit  is  in  the  1-up  state  to  the  probability  it  is  in  the 
2-up  state.   Thus  in  the  state  (p,g)  the  belief  the  unit  is 
down,  in  the  1-up  state  and  the  2-up  state  are  respectively 

i-P/  gp/g+i/  p/g+i. 

We  assume  that  after  a  repair  the  unit  is  in  state  (r,w)  and 
define  a  =  s,/s„,  where  without  loss  of  generality,  we  assume 
s,    s„.   The  occurrence  of  a  successfully-dealt-with  initiating 
event  is  treated  as  an  inspection  which  takes  no  time.   Let 
V(p,g)  be  the  maximum  extra  number  of  periods  under  the  best 
inspection  policy  until  a  catastrophic  event,  than  if  there  was 
no  standby  unit  (i.e.,  same  definition  as  in  Section  2) . 
Again,  Denardo ' s  results  [4]  guarantee  the  optimal  policy  to 
be  a  deterministic  one,  it  satisfies  the  optimality  equation 
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V(p,g)   =   max{W1(p,g)/  W2(p,g),  W3(p,g)}  (5.1) 


W1(p/g)   =   P  +  (l-3)V(s2p(ag+l)/(g+l) ,ag)  +  3pV(i,g) 


W2(p,g)   =  p(1-3)M  v(i,g)  +(l-p)(l-6)R  v(r,w) 


W3(p,g)   =   (l-e)R  V(r,w) 


The  assumption  is  that  an  inspection  affects  the  probability 
the  unit  is  up,  but  not  the  ratio  between  the  two  up  states, 
whereas  a  repair  always  returns  the  unit  to  the  state  (r,w) . 
(s~p(ag+l) /g+1 ,ag)  is  the  Bayesian  updated  belief  of  the  state 
(p,g) ,  using  the  fact  that  no  initiating  event  occurred.   The 
optimal  policy  for  this  model  is  given  as  follows. 

Theroem  5.1. 

The  optimal  policy  is  given  by  a  function  p*(g)  and  a 
number  g*  so  in  state  (p,g) ,  it  does  nothing  if  p  >  p*(g) , 
inspects  if  p  £  p*(g)  ,  g  >  g*,  and  repairs  if  p  <_   p*(g)  /  g  <   g* 

Proof 

As  in  Theorem  3.1  an  inductive  proof  on  the  iterates  of 
value  iteration  proves  that  V(p,g)  is  convex  and  non-decreasing 
in  p  and  non-decreasing  in  g.   Now  define 

Wg  =  (p|V(p,g)  >  W,  (p,g)};  then  the  linearity  of  W   and  V?3  and 
the  convexity  of  V  in  p  guarantees  Wg  is  convex,  just  as  in 
Theorem  3.1.   V(0,g)  =  V(0,g')  since  if  p  =  0  there  is  only 
one  state.   From  (5.1)  it  follows  that 
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V(0,g)   =   max{(l-g)V(0,ag) ,  (l-3)NV(r,w)   .      (5.2) 


By  definition  V(r,w)  >  0  and  if  V(0/g)  =  W,(0,g)  =  (l-3)V(0,ag)  = 
(l-3)V(0,g)  then  V(0/g)  =  0,  and  hence  0  e  Wg.   Thus  Wg  =  [0,p*(g)] 
and  result  holds,  g*  satisfies  (1-|3)M  V(i,g*)  =  (1-3)N  V(r,w); 
and  since  V(i,g)  is  non-decreasing  in  g  this  gives  the 
division  between  inspection  and  repair. 

Again  we  can  rewrite  the  state  space  in  terms  of  the  number 
of  periods  since  the  last  inspection  and  the  last  repair.   Let 
S  =  {  (m,n)  |  0  <_  m  <_   n   °°}  where  (m,n)  is  the  state  which  is  m 
periods  since  the  end  of  the  last  inspection  or  the  end  of 
repair  if  it  followed  from  the  last  inspections  and  n  non- 
inspection  periods  since  the  last  repair.   The  state  (m,n)  is 
equivalent  to  g  =  aw, 


smi(anw+l)/(an_mw+l)    if    n  >  m  , 

(5.3) 
smr(anw+l)/(an"mw+l)    if   n  =  m 


If  we  define  p(m,n)  according  to  (5.3),  the  optimality  equation 
for  this  state  space  is 

p(m,n)  +  (l-3)V(m+l,n+l)  +  gp (m,n) V( 0 ,n) 
V(m,n)      max/  p(m,n)  (1-3)M  V(0,N)  +  (l-p(m,n)  )  (l-g)R  V(0,0) 


(1-3)R  V(0,0)  (5.4) 
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Theorem  5.1  can  be  reinterpreted  for  this  state  space. 

Corollary  5.1. 

The  optimal  policy  is  given  by  a  function  m*(n)  and  a 
number  n*  so  that  at  (m,n) ,  do  nothing  if  m  <  m* (n) ;  inspect 
if  m  >  m*(n)  ,  n  >  n*;  repair  if  m  >  m*(n)  ,  n  <_   n* .   Notice  if 
i  _>  r,  n*  =  0  and  we  always  inspect. 

Again  we  can  use  value  iterations  on  a  finite  state  approxi- 
mation of  the  Markov  decision  model  given  by  (5.4)  (see  White 
[22]  for  the  bounds).   This  gives  us  the  results  found  in  Table  5 
namely  the  optimal  periods  for  inspections,  counting  from  the 
last  repair. 

Note  that  the  optimal  inspection  pattern  appears  to  have 
short  inter-inspection  times  just  after  a  repair,  which  gradually 
increase  to  long  inspection  times,  provided  the  system  continues 
to  be  found  up  upon  inspection.   Hazardous  inspection  (i  small) 
has  a  more  drastic  effect  on  the  expected  time  to  a  catastrophic 
failure  than  similar  changes  in  r,  or  s.  and  s„. 

Model  8 

As  in  Section  3,  we  could  also  model  the  situation  in  which 
the  information  acquired  from  successfully-dealt-with  initiating 
events  is  ignored.   Then  3,  i,    s-.,  s2,  M,  R  are  still  defined 
as  in  Model  7,  but  immediately  after  an  inspection  or  repair 
the  time  to  the  next  inspection  or  repair  is  determined,  and 
which  kind  it  will  be.   Immediately  after  a  repair  suppose  the 
unit  has  probability  r, ,  r   respectively  of  being  in  the  1-up 
or  2-up  state.   The  decision  points  are  immediately  after  a 


41 


w 

PQ 


>H 

u 

H 

o 

Pm 


En 
Cm 
O 


o 
o 

> 

H       H 
CO       Cm 

w 

Eh 
2 


2 

O 


Eh 
Pm 
O 


& 


fa 


Eh  Q 

U  O 

W  H 

Ph  P4 

CO  W 

2  a. 

H 

o 

v-1  m 


Eh 
en 


CO 

a> 

g 

■H 

+J 

O 

in 


o 


ro 


in 


CO 

a> 

e 

•H 

-P 

O 

in 


o 


0) 

u 

c 
o 


CN 


*3< 


in 


CO 

<u 
g 

■H 
-P 


00 

■• — ■* 
CO 


•H 
-P 


u 

^        -H 

Cm 
0) 


CT> 


CO 
CU 

g 

•H 
-P 

m 


CN 


CO 

0) 

g 

-H 

-P 

O 

CN 


a> 

(Tl 

(30 

CN 

r- 

*X> 

ro 

CN 

o 

CN 

«^ 

m 

00 

^ 

^ 

^r 

r~ 

rH 

i-H 

<-\ 

<-\ 

r^ 

•sr 

>vD 

.H 

.H 

r-\ 

<-\ 

0) 

g 

•H 
-P 

m 

rH 

CO 

,. — , 

. — % 

^-^ 

CO 

' ' 

d) 

CO 

CO 

CO 

a) 

g 

a> 

d) 

0) 

g 

oo 

•H 

g 

g 

g 

-H 

-P 

-H 

•H 

-H 

P 

, — . 

P 

-p 

P 

0) 

O 

O 

O 

CM 

•** 

ro 

■^ 

CN 

C 

w 

CN 

CN 

CN 

w 

O 

CN 


CO 

0) 

g 

-H 

P 


CO 

a) 
g 

•H 
-P 

in 

CN 


CN 


CO 
Q) 
g 
-H 
-P 

ro 


CN 


CO 

0) 

g 

•H 
■P 

in 


CN 


CO 

d) 

g 

•H 

-P 

CN 


CN 


CO 

0) 

g 

•H 
-P 


CN 


CO 

Q) 
g 
•H 
-P 


o 


o 


o 


o 


CN 


o 


o 


o 


o 


o 


o 


U 
-H 

fd 

Cm 
0) 

u 


o 


CT 


CO 
0) 

g 

-H 

-P 

o 

CN 


CN 


CO 
0) 
g 
■H 
P 

CT 


o 


in 

m 

in 

in 

in 

m 

m 

in 

in 

in 

m 

in 

m 

u~> 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

ro 

O 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

CN 
CO 


CT 


CTi 


cr 


CT 


<y\ 


CTi 


CN 


in 


in 


in 


m 


m 


CO 


VO  <0  ^D  <X>  V£) 

<y\       a\       <j\       a\       cr 


CN 

r^- 

CN 

in 

KD 

*X> 

>x» 

r- 

r^ 

KD 

^O 

<£) 

VD 

CTi 

CTi 

CT 

cr 

o\ 

CT 

CT 

CT 

CT 

i— I  CT  i— I 


CT 


^r 


CN 


CN 


<H  iH 


a\       cr       in       cr       m 


a\ 


CT 


CT 


cr 


CT 


CT 


IT) 


CTi  LT) 


cr       cr       cti       in       in 


CTi 


in 

CTi 


m 

CTi 


m 

CTi 


in 

CT 


<T 


CT 


in       m 


ca 


42 


repair,  and  immediately  after  an  inspection,  where  it  is  important 

to  know  the  number  of  operating  periods  n  since  the  last  repair. 

We  denote  the  maximum  expected  times  until  a  catastrophic  event 

at  these  decision  points  as  V  ,  V.    respectively.   As  in  Model  4 

^  r '   1,  n     ^  2 

we  can  write  down  the  optimality  equation  connecting  these  values: 


Vr   =   max    /  L(r,Wr)  +  (1-f (r,Wr) ) (  (  (1- (1-3 ) R) /3  +  (1-3) RVr) ; 


T  ,W 
r   r 


L(r,Tr)  +  (r1s1r  +  r2s2r)  (  (l-(l-3)  /$)  +  (1-3)  v'±  T  ) 

T        T  R  R 

+  (l-rlSlr  -r2s2r  -  f  (r,Tr)  )  ((l-(l-3)  /S  +  (1-3)  Vr) 


V 


i,n 


max   /L(i(n)  ,T±  R)  +(i(n)1s11'n  +  i{n)  ^^ 'n)   ((l-d-sT  )/ 

T  .  T  . 

+  (1-3) V  n+T    +  (l-i(n)1s11'n-i(n)2s21'n 
i,n 

-  f(i(n),T.   )  ((l-(l-3)  R/3  +  d-3)RV  )        (5.5) 


where  r  =  (r1  ,r2 ,  l-r1-r2)  .   If  p  =  (p]_/P2/P3)  where  px  is  the 
probability  of  being  in  the  1-up  state,  p2  is  the  probability  of 
being  in  the  2~up  state,  and  p3  is  the  probability  of  being 
down,  then 


T-l  T-2 

L(p,T)   =   T  -  p^      (l-(l-3)k)  "P1       (l-d-3)T"k_1)  (1-s^) 

J  k=l  k=l 


-  p9  V  ((l-(l-3)T"k_1)  d-sk)  (5.6) 

k=l 
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is  the  expected  time  until  a  catastrophic  failure  in  first  T 
periods  starting  in  state  p. 

n       .    n 
ir,s,       ir2S2 

i(n)   =   ( ,  /1-i)  is  the  state  of  the  system 

—  n.nn.n  J 

rlSl+r2S2   rlSl+r2S2 
after  inspection  n  operating  periods  after  last  repair,  while 


f(p_,T)      (l-3)f  (p_,T-l)  +p13(l-s^  X)   +p23d-S2~1)  +  3p3    (5.7) 


is  the  probability  there  has  been  a  catastrophic  failure  within 

T  periods,  starting  in  state  p.   Again,  the  general  results  of 

Markov  renewal  programming  [7]  show  that  the  only  possible 

optimal  policies  are  tt  (W)  ,  i.e.,  repair  every  W,  or  tt.{T  ,T,  ,  T„  ,  .  .  .  } 

which  is  inspect  T   periods  after  a  repair,  and  T,  periods 

after  the  k  '  inspection  after  a  repair.   In  order  to  find  the 

optimal  policy  it  is  easier  to  work  with  V  =  V    1/3  again, 

and  using  (5.5)  we  can  show  that  under  the  policy  tt  (W)  if 

r  =  (r^r^l-rj-r,,) 

rn (1-s, )    r9 (l-s9) 
Vr   =   (  \1_s    )   +  Z{1_s    )-)/(!  -  (l-f(r,W))(l-3)K)  .     (5.8) 


Under  the  policy  it  .  (T„  ,T,  ,  .  .  .  )  we  get  the  following  equations 

T  T 

rn  (1-s,0)  r„(l-s    °)  T  T  M    ~ 

T7  11  2  2  ,  o    ,  0>   r/n     DNM    T.  -, 

Vr      =         (1-Sl)         +       (l-s2)         +  (rlSl     +r2S2    )[(1"3)       Vi,TQ] 

T  T  ~ 

+    (1  -  rlSl°  -r2s2°  -  f  (r,TQ))  (1-3)K  Vr    .  (5.9) 
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If  Tk-1   =   To  +  Tl  +  '••  +  Tk-1 

V.        =   ^(Tk-l)l(1-Slk)  ,  i(Tk-l}2(1-S2k) 


1'Tk-l  d-Sj^)  (1_s2) 

T  T 

+  <i<Wisi   +i(Tk.1)2s2k)(i-e)f'  v 

k 


~    f (i(Tk_1)  fTk) (1-3)R  Vr  (5.10) 


It  appears  somewhat  difficult  to  solve  (5.9)  and  (5.10)  as  we 

have  an  infinite  set  of  equations.   However,  we  can  assume  for 

all  t,  >  N,  for  some  N,  V.      is  approximately  constant,  since 
K  -  i,ik 

if  a  large  number  of  periods  have  passed  since  the  last  repair, 

with  no  intervening  failure,  it  is  a  good  approximation  to 

assume  the  unit  is  in  the  better  of  the  two  up  states.   This 

enables  us  to  solve  these  equations  using  the  bisection  method 

reviewed  in  Thomas  [19  .   The  method  depends  on  the  fact  that  if  we 

substitute  V   =  c  in  the  R.H.S.  of  (5.9)  and  (5.10)  we  can  work 

back  and  solve  for  V   on  the  L.H.S.  of  (5.9) .   If  c  is  the 

correct  value  of  V  ,  the  L.H.S.  of  (5.9)  is  c,  but  if  c  >  V  , 

it  follows  easily  that  the  L.H.S.  of  (5.9)  will  be  greater  than 

c,  while  if  c  <  v   it  will  be  smaller  than  c.   Using  this  as 

r 

the  basis  of  the  bisection  method  and  taking  all  inspections 
more  than  50  periods  after  a  repair  as  the  same,  we  get  the 
forms  of  the  approximately  optimal  policies  found  in  Table  6; 
(the  units  of  time  are  weeks) . 
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The  parameters  in  the  comparable  continuous  time  model  of 
Section  2  are  (in  units  of  weeks)  :   v  =  01,  tt,  =  ~   =  1  -  tt  , 
61  =  .04,  62  =  0.5,  M  =  0.035  and  R  =  0.07.   The  corresponding 
best  policies  under  the  "short-long"  inspection  rule  of  Section 
2  with  inter-inspection  times  restricted  to  being  multiplies 
of  a  week  are  as  follows: 


Table  7 

Best  expected  time  to 
Case    OKI     OKR     Best  Policy         a  catastrophic  Event 

61.09 
42.19 
29.37 
19.98 


The  difference  in  policies  for  Case  III  results  from  the 
fact  that  the  discrete  time  model  allows  a  decision  of  repair 
without  inspection.   The  differences  in  the  policies  for  cases 
I,  II,  and  IV  come  about  because  the  continuous  time  model  only 
allows  inspection  periods  of  two  different  lengths  whereas 
the  optimal  policy  in  the  discrete  time  model  goes  gradually 
from  the  length  of  the  inspection  period  just  after  a  repair 
to  an  asymptotic  inspection  period  if  the  inspections  are 
successful.   However,  subject  to  its  restrictions,  the  policy 
of  the  continuous  time  model  is  comparable  to  that  of  the  dis- 
crete time  model. 


I 

0.9 

0.9 

1 

(4  times) ,  2 

II 

0.9 

0.5 

1 

(2  times) ,  3 

II 

0.5 

0.9 

3 

(1,  time) ,  1 

IV 

0.5 

0.5 

3 

(1  time)  ,  °° 
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The  differences  between  the  best  expected  times  to  a 
catastrophic  event  in  the  two  models  results  from  the  discreti- 
zation of  time  in  Model  8.   If  the  time  interval  in  the  discrete 
time  model  of  Case  I  is  taken  to  be  1/10  week  instead  of  1  week 
with  the  resulting  change  of  parameters  3  =  .01,  i  =  0.9, 
r1  =  0.6,  r  =  0.3,  s   =  .996,  s2  =  .95,  M  =  0.35,  R  =  0.7, 
then  the  optimal  policy  is  inspect  7  periods  after  a  repair, 
and  if  up,  then  8  periods  later,  then  9,  11,  13,  16,  18  and 
20  periods  and  the  expected  time  until  catastrophic  failure  is 
626.0  periods.   In  the  original  time  scale  this  is  a  time  of 
62.6  weeks.   Note  that  the  difference  between  the  expected  time 
to  a  catastrophic  event  is  now  small  for  the  two  models.   This 
suggests  that  the  policy  that  was  proposed  in  Section  2,  while 
not  optimal,  is  a  good  one. 
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6.   CONCLUSIONS 

The  following  conclusions  can  be  drawn  about  the  form  of 
the  optimal  policy,  by  studying  the  models  in  this  paper. 

1)  If  the  failure  rate  of  the  system  increases  with  age, 
then  the  inspection  intervals  should  decrease,  and  do.   Numeri- 
cal examples  based  on  Model  1  have  borne  this  out.   The  model 
calculations  suggest  optimal  intervals  based  on  the  underlying 
parameters . 

2)  If  there  is  only  one  state  the  unit  can  be  in  when  it 
is  'up',  and  the  probability  of  being  up,  i,  is  the  same  after 
each  inspection  and  the  probability  of  being  up  after  a  repair 
is  also  a  constant  r,  then  the  optimal  policy  is  to  have  one 
'short'  inspection  interval  after  a  repair,  and  a  'longer' 
inspection  interval  always  thereafter  (i  >  r)  or  else  to  repair 
at  fixed  intervals  with  no  inspection  (r  considerably  larger 

than  i) .   The  'longer'  inspection  interval  must  always  be  at  least  as 
long  as  the  'short'  initial  inspection  interval. 

3)  The  results  of  1)  and  2)  hold  whether  or  not  successfully-deal 
with  initiating  events  are  considered  as  a  type  of  inspection. 
However,  there  are  considerable  differences  in  the  actual  in- 
spection periods  for  these  two  cases. 

4)  In  order  for  the  optimal  inspection  problem  to  require 
several  'short'  inspection  intervals  followed  by  longer  ones 

it  is  necessary  to  assume  the  unit  can  be  in  more  than  one  'up' 
state  with  different  failure  rates.   In  this  case  there  is 
not  an  abrupt  jump  from  'short'  inspection  intervals  to  'long', 
but  a  gradual  increase  in  the  inspection  interval.   However, 


49 


there  is  a  suggestion  that  a  policy  comparable  to  the  optimal 
one  in  which  there  is  a  sharp  jump  between  short  inspections 
and  long  ones,  will  give  the  expected  time  to  a  catastrophic 
event  that  is  close  to  that  achieved  by  the  optimal  policy. 
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